Combining information on structure and content to automatically annotate natural science spreadsheets

نویسندگان

  • Martine G. de Vos
  • Jan Wielemaker
  • Hajo Rijgersberg
  • Guus Schreiber
  • Bob J. Wielinga
  • Jan L. Top
چکیده

In this paper we propose several approaches for automatic annotation of natural science spreadsheets using a combination of structural properties of the tables and external vocabularies. During the design process of their spreadsheets, domain scientists implicitly include their domain model in the content and structure of the spreadsheet tables. However, this domain model is essential to unambiguously interpret the spreadsheet data. The overall objective of this research is to make the underlying domain model explicit, to facilitate evaluation and reuse of these data. We present our annotation approaches by describing five structural properties of natural science spreadsheets, that may pose challenges to annotation, and at the same time, provide additional information on the content. For example, the main property we describe is that, within a spreadsheet table, semantically related terms are grouped in rectangular blocks. For each of the five structural properties we suggest an annotation approach, that combines heuristics on the property with knowledge from external vocabularies. We evaluate our approaches in a case study, with a set of existing natural science spreadsheets, by comparing the annotation results with a baseline based on purely lexical matching. Our case study results show that combining information on structural properties of spreadsheet tables with lexical matching to external vocabularies results in higher precision and recall of annotation of individual terms. We show that the semantic characterization of blocks of spreadsheet terms is an essential first step in the identification of relations between cells in a table. As such, the annotation approaches presented in this study provide the basic information that is needed to construct the domain model of scientific spreadsheets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Plausible is Automatic Annotation of Scientific Spreadsheets?

It is possible to automatically annotate a natural science spreadsheet using lexical matching, given that the tables in these spreadsheets meet a number of requirements regarding the content. Results of a survey show that most of the existing natural science spreadsheets deviate from the ideal situation. We propose to complement lexical matching with both heuristics and knowledge from external ...

متن کامل

RDF Modelling and SPARQL Processing of SQL Abstract Syntax Trees

Syntax Trees Corentin Follenfant, Olivier Corby, Fabien Gandon, and David Trastour 1 INRIA Sophia Antipolis Méditerranée [email protected] 2 SAP Research [email protected] Abstract. Most enterprise systems rely on relational databases, and therefore SQL queries, to populate dynamic documents such as business intelligence reports, dashboards or spreadsheets. These queries repr...

متن کامل

Introduction to Natural Language Syntax and Parsing Lecture 1: Automatic Linguistic Annotation

Automatic Linguistic Annotation We would like to automatically annotate linguistic units (typically sentences) with some linguistic structure, in order to facilitate various NLP tasks and applications, such as (semantic) search, question answering, information extraction, machine translation, and so on. We might also want to model some aspects of linguistic structure for linguistic, or cognitiv...

متن کامل

YouTube Scale, Large Vocabulary Video Annotation

As video content on the web continues to expand, it is increasingly important to properly annotate videos for effective search and mining. While the idea of annotating static imagery with keywords is relatively well known, the idea of annotating videos with natural language keywords to enhance search is an important emerging problem with great potential to improve the quality of video search. H...

متن کامل

Rightfield: Embedding Ontology Term Selection into Spreadsheets for the Annotation of Biological Data

RightField is an open source application that provides a mechanism for embedding ontology annotation support for Life Science data in Microsoft Excel spreadsheets. Individual cells, columns, or rows can be restricted to particular ranges of allowed classes or instances from chosen ontologies. Informaticians, with experience in ontologies and data annotation prepare RightField-enabled spreadshee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Hum.-Comput. Stud.

دوره 103  شماره 

صفحات  -

تاریخ انتشار 2017